智能论文笔记

Medical Visual Question Answering: A Survey

Zhihong Lin , Donghao Zhang , Qingyi Tac , Danli Shi , Gholamreza Haffari , Qi Wu , Mingguang He , Zongyuan Ge

分类：计算机视觉 | 人工智能

2021-11-19

医学视觉问题应答（VQA）是医疗人工智能和流行的VQA挑战的组合。鉴于医学形象和在自然语言中的临床相关问题，预计医疗VQA系统将预测符号和令人信服的答案。虽然一般域VQA已被广泛研究，但医疗VQA仍然需要特定的调查和探索，因为它的任务特征是。在本调查的第一部分，我们涵盖并讨论了关于数据源，数据数量和任务功能的公开可用的医疗VQA数据集。在第二部分中，我们审查了医疗VQA任务中使用的方法。在最后，我们分析了该领域的一些有效的挑战，并讨论了未来的研究方向。

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译

Generalised agent for solving higher board states of tic tac toe using Reinforcement Learning

Bhavuk Kalra

分类：人工智能

2022-12-23

Tic Tac Toe is amongst the most well-known games. It has already been shown that it is a biased game, giving more chances to win for the first player leaving only a draw or a loss as possibilities for the opponent, assuming both the players play optimally. Thus on average majority of the games played result in a draw. The majority of the latest research on how to solve a tic tac toe board state employs strategies such as Genetic Algorithms, Neural Networks, Co-Evolution, and Evolutionary Programming. But these approaches deal with a trivial board state of 3X3 and very little research has been done for a generalized algorithm to solve 4X4,5X5,6X6 and many higher states. Even though an algorithm exists which is Min-Max but it takes a lot of time in coming up with an ideal move due to its recursive nature of implementation. A Sample has been created on this link \url{https://bk-tic-tac-toe.herokuapp.com/} to prove this fact. This is the main problem that this study is aimed at solving i.e providing a generalized algorithm(Approximate method, Learning-Based) for higher board states of tic tac toe to make precise moves in a short period. Also, the code changes needed to accommodate higher board states will be nominal. The idea is to pose the tic tac toe game as a well-posed learning problem. The study and its results are promising, giving a high win to draw ratio with each epoch of training. This study could also be encouraging for other researchers to apply the same algorithm to other similar board games like Minesweeper, Chess, and GO for finding efficient strategies and comparing the results.

translated by 谷歌翻译

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Zhikai Li , Junrui Xiao , Lianwei Yang , Qingyi Gu

分类：计算机视觉 | 机器学习

2022-12-16

Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.

translated by 谷歌翻译

UBARv2: Towards Mitigating Exposure Bias in Task-Oriented Dialogs

Yunyi Yang , Hong Ding , Qingyi Liu , Xiaojun Quan

分类：自然语言处理

2022-09-15

本文研究了以任务为导向的对话系统中的曝光偏差问题，其中模型在多个转弯中生成的内容驱动对话框上下文远离训练时间的地面真相分布，从而引入了错误传播并损害了TOD系统的稳健性。为了弥合训练和推理多转弯任务导向对话框之间的差距，我们建议会话级抽样，该采样将模型明确地暴露于培训期间对话框上下文的采样生成的内容。此外，我们采用基于辍学的一致性正规化与屏蔽策略R掩码，以进一步提高模型的鲁棒性和性能。拟议的UBARV2在标准化评估基准Multiwoz上实现了最先进的性能，并且广泛的实验显示了所提出的方法的有效性。

translated by 谷歌翻译

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Zhikai Li , Mengjuan Chen , Junrui Xiao , Qingyi Gu

分类：计算机视觉

2022-09-13

无数据量化可以潜在地解决模型压缩中的数据隐私和安全问题，因此已得到广泛研究。最近，PSAQ-VIT设计了一个相对值度量，贴片相似性，以生成预训练视觉变压器（VIT）的数据，从而实现了VIT的第一次无数据量化尝试。在本文中，我们提出了PSAQ-VIT V2，这是在PSAQ-VIT之上建立的更准确，无数据的VIT的更准确和无数据的量化框架。更具体地说，按照PSAQ-VIT中的贴片相似性度量，我们引入了一种自适应的教师学生策略，该策略促进了生成的样品的持续环节演变和量化的模型（学生），并在竞争性和互动方式下以竞争性和互动方式进行。完整的模型（教师），因此显着提高了量化模型的准确性。此外，没有辅助类别指导，我们采用了任务和模型独立的先验信息，使通用方案与广泛的视觉任务和模型兼容。对图像分类，对象检测和语义分割任务和PSAQ-VIT V2进行了各种模型进行了广泛的实验，并具有幼稚的量化策略，并且没有访问现实世界数据，从而始终取得了竞争性的结果，显示出潜力作为强大的基线的潜力关于VIT的无数据量化。例如，使用SWIN-S作为（骨干）模型，8位量化达到ImageNet上的82.13 TOP-1精度，50.9盒AP和可可的44.1 Mask AP，而ADE20K上的47.2 miOU。我们希望准确，一般的PSAQ-VIT V2可以作为涉及敏感数据的现实应用程序中的潜在和实践解决方案。代码将在以下网址发布并合并：https：//github.com/zkkli/psaq-vit。

translated by 谷歌翻译

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Zhikai Li , Qingyi Gu

分类：计算机视觉

2022-07-04

视觉变压器（VIT）在各种计算机视觉应用程序上都达到了最先进的性能。但是，这些模型具有相当大的存储和计算开销，使其部署和对边缘设备的有效推断充满了挑战。量化是降低模型复杂性的一种有前途的方法。不幸的是，现有的量化VIT的努力是模拟量化（又称假量化），该量化在推理过程中仍然是浮点算术的，因此对模型加速度无济于事。在本文中，我们提出了I-VIT，即VIT的仅整数量化方案，以使VIT能够使用整数操作和位移动和无浮点操作执行整个推理的计算图。在I-VIT中，线性操作（例如，矩阵和密集）遵循具有二元算术的仅整数管道，而非线性操作（例如，SoftMax，Gelu和Layernorm和Layernorm）近似于提议的轻量级近似算术方法。特别是，I-Vit应用了所提出的ShiftMax和ShiftGelu，它们旨在使用整数位移动来近似相应的浮点操作。我们在各种基准模型上评估了I-VIT，结果表明，仅整数INT8量化具有与完整精确（FP）基线相当（甚至更高）的精度。此外，我们在GPU的整数算术单元上使用TVM进行实用的硬件部署，与FP模型相比，实现了3.72〜4.11 $ \ times $推理的速度。

translated by 谷歌翻译

Patch Similarity Aware Data-Free Quantization for Vision Transformers

Zhikai Li , Liping Ma , Mengjuan Chen , Junrui Xiao , Qingyi Gu

分类：计算机视觉

2022-03-04

视觉变压器最近在各种计算机视觉任务上取得了巨大成功。然而，他们的高模型复杂性使部署在资源约束设备上的挑战。量化是一种有效的方法，可以减少模型复杂性，并且可以在模型部署期间解决数据隐私和安全问题的无数据量化已获得广泛的兴趣。不幸的是，所有现有的方法（例如BN正则化）都是为卷积神经网络而设计的，不能应用于具有明显不同模型体系结构的视觉变压器。在本文中，我们提出了PSAQ-VIT，这是视觉变压器的贴片相似性无数据量化框架，以根据视觉变压器的唯一属性来生成“现实”样品，以校准量化参数。具体而言，我们分析了自我发场模块的特性，并在处理高斯噪声和真实图像的处理中揭示了一般差异（斑块相似性）。以上见解指导我们设计一个相对值度量，以优化高斯噪声以近似真实的图像，然后将其用于校准量化参数。对各种基准进行了广泛的实验和消融研究，以验证PSAQ-VIT的有效性，这甚至可以优于实现DATA驱动的方法。

translated by 谷歌翻译

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging

Shuangrui Ding , Maomao Li , Tianyu Yang , Rui Qian , Haohang Xu , Qingyi Chen , Jue Wang

分类：计算机视觉

2021-09-30

鉴于在图像领域的对比学习的成功，目前的自我监督视频表示学习方法通常采用对比损失来促进视频表示学习。然而，当空闲地拉动视频的两个增强视图更接近时，该模型倾向于将常见的静态背景作为快捷方式学习但不能捕获运动信息，作为背景偏置的现象。这种偏差使模型遭受弱泛化能力，导致在等下游任务中的性能较差，例如动作识别。为了减轻这种偏见，我们提出\ textbf {f} Oreground-b \ textbf {a} ckground \ textbf {me} rging（sm} rging（fame）故意将所选视频的移动前景区域故意构成到其他人的静态背景上。具体而言，没有任何非货架探测器，我们通过帧差和颜色统计从背景区域中提取移动前景，并在视频中擦拭背景区域。通过利用原始剪辑和熔融夹之间的语义一致性，该模型更多地关注运动模式，并从背景快捷方式中脱位。广泛的实验表明，FAME可以有效地抵抗背景作弊，从而在UCF101，HMDB51和Diving48数据集中实现了最先进的性能。

translated by 谷歌翻译